Ex-10:¶

Apply (a) Power Map technique (q = 1.5 and 2.5) and (b) ensemble technique (500 member data ensemble for each epoch, dimension: 38 times 40) to Nikkei stock market data with epoch lengths of 40 days each. Compare the results with that obtained in Ex-9.¶
In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
In [2]:
# We open the data and make the ticker as the index
df = pd.read_csv('Nikkei-2010-2012.csv',index_col="Ticker")
df.head()
Out[2]:
2010-01-04 2010-01-05 2010-01-06 2010-01-07 2010-01-08 2010-01-12 2010-01-13 2010-01-14 2010-01-15 2010-01-18 ... 2012-12-14 2012-12-17 2012-12-18 2012-12-19 2012-12-20 2012-12-21 2012-12-25 2012-12-26 2012-12-27 2012-12-28
Ticker
4151.t 999.0 989.0 1003.0 984.0 991.0 984.0 985.0 990.0 989.0 975.0 ... 845.0 848.0 852.0 861.0 850.0 852.0 856.0 850.0 851.0 849.0
4502.t 3850.0 3870.0 3870.0 3930.0 3900.0 3940.0 3930.0 3970.0 3970.0 3945.0 ... 3805.0 3835.0 3845.0 3870.0 3875.0 3865.0 3865.0 3865.0 3860.0 3855.0
4503.t 694.0 700.0 700.0 700.0 702.0 700.0 696.0 704.0 702.0 697.0 ... 797.0 800.0 809.0 812.0 802.0 792.0 797.0 798.0 780.0 775.0
4506.t 979.0 984.0 991.0 982.0 981.0 977.0 979.0 982.0 980.0 965.0 ... 999.0 1013.0 1019.0 1026.0 1012.0 1010.0 1021.0 1032.0 1033.0 1035.0
4507.t 2003.0 2007.0 2007.0 1957.0 1930.0 1931.0 1904.0 1968.0 1957.0 1926.0 ... 1349.0 1382.0 1443.0 1462.0 1461.0 1457.0 1454.0 1469.0 1479.0 1437.0

5 rows × 736 columns

Calculate the normal returns¶

For each entry, we define a new time series that has the returns instead of the prices. It is defined as $r_{i+1} = (p_{i+1} - p{i})/p_{i}$

In [3]:
### First we will compute the returns matrix
# We define the function "returns"

#This function takes as input a df where each rhow is a time series
def returns(df):
    mat = np.array(df)
    # T is the number of columns, that is, the number of times measured
    T = len(mat[0])
    
    # We define a new data frame, which will have the returns.
    new_df = df.copy()
    # We delete the first column of the new data frame, since there are no
    # returns there.
    new_df.drop(columns=new_df.columns[0], 
        axis=1, 
        inplace=True)
    
    
    #Iterate over every column
    for i in range(T-1):
        #We overwrite the column of the new_df with the returns
        new_df.iloc[:,i] = (df.iloc[:,i+1]-df.iloc[:,i])/df.iloc[:,i]
    
    
    return(new_df)

df_returns = returns(df)

df_returns.head()
Out[3]:
2010-01-05 2010-01-06 2010-01-07 2010-01-08 2010-01-12 2010-01-13 2010-01-14 2010-01-15 2010-01-18 2010-01-19 ... 2012-12-14 2012-12-17 2012-12-18 2012-12-19 2012-12-20 2012-12-21 2012-12-25 2012-12-26 2012-12-27 2012-12-28
Ticker
4151.t -0.010010 0.014156 -0.018943 0.007114 -0.007064 0.001016 0.005076 -0.001010 -0.014156 -0.003077 ... 0.001185 0.003550 0.004717 0.010563 -0.012776 0.002353 0.004695 -0.007009 0.001176 -0.002350
4502.t 0.005195 0.000000 0.015504 -0.007634 0.010256 -0.002538 0.010178 0.000000 -0.006297 -0.002535 ... 0.000000 0.007884 0.002608 0.006502 0.001292 -0.002581 0.000000 0.000000 -0.001294 -0.001295
4503.t 0.008646 0.000000 0.000000 0.002857 -0.002849 -0.005714 0.011494 -0.002841 -0.007123 -0.004304 ... -0.001253 0.003764 0.011250 0.003708 -0.012315 -0.012469 0.006313 0.001255 -0.022556 -0.006410
4506.t 0.005107 0.007114 -0.009082 -0.001018 -0.004077 0.002047 0.003064 -0.002037 -0.015306 0.006218 ... 0.007056 0.014014 0.005923 0.006869 -0.013645 -0.001976 0.010891 0.010774 0.000969 0.001936
4507.t 0.001997 0.000000 -0.024913 -0.013797 0.000518 -0.013982 0.033613 -0.005589 -0.015841 0.004673 ... -0.010997 0.024463 0.044139 0.013167 -0.000684 -0.002738 -0.002059 0.010316 0.006807 -0.028398

5 rows × 735 columns

Epochs¶

Now we make a function that divides the time horizon into epochs of length L.

In [5]:
def epoch(df, L):
    #Number of epochs:
    N = df.shape[1]//L
    
    #A list with all the epochs we will create:
    epochs = []
    
    for i in range(N):
        ep = df.iloc[:,i*L:(i+1)*L]
        epochs.append(ep)
        
    return(epochs)

# List of epochs for the returns, with a time of 40
epochs_returns_40 = epoch(df_returns,40)

Power map¶

We construct the correlation matrix of each epoch and then apply to it the power map and then plot the resulting correlation matrix and the eigenvalue distribution

In [6]:
## Function that given a matrix, applies the power map
def powermap(C,q):
    for i in range(len(C)):
        for j in range(len(C[0])):
            C[i][j] = np.sign(C[i][j]) * np.abs(C[i][j])**q
    return(C)
In [7]:
# Function that given the epochs, applies the power map
# plots all the corralation matrices and the eigenvalue distributions
def plots_power(epochs,q=1.5):
    for i in range(len(epochs)):
        #Select the ith epoch
        ep = epochs[i]
        
        #Print the dates
        print("Dates: From ", ep.columns[0], " to ", ep.columns[-1])
        print("Correlation matrix:")
        
        #Get the correlation matrix
        C = np.array(ep.T.corr())
        
        #Apply the power map
        C = powermap(C,q)
        
        # Plot the correlation matrix
        plt.imshow(C,vmin=-1,vmax=1,
          cmap="seismic")
        
        plt.colorbar()
        plt.show()
        
        print("Eigenvalue Distribution of the correlation mat:")
        eig = np.linalg.eigvals(C)
        #Make them real (they are already real, but 
        #numeric errors add a small imaginary part and
        #when plotting it like that, it returns a warning)
        eig = np.real(eig)
        plt.hist(eig, density = True, bins = 100, log=True)
        plt.show()
        
        

Plot the correlation matrices for the returns data with the power map applied with $q=1.5$.

In [8]:
plots_power(epochs_returns_40,1.5)
Dates: From  2010-01-05  to  2010-03-03
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2010-03-04  to  2010-04-30
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2010-05-06  to  2010-06-30
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2010-07-01  to  2010-08-27
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2010-08-30  to  2010-10-28
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2010-10-29  to  2010-12-28
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2010-12-29  to  2011-02-28
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2011-03-01  to  2011-04-26
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2011-04-27  to  2011-06-27
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2011-06-28  to  2011-08-23
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2011-08-24  to  2011-10-21
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2011-10-24  to  2011-12-20
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2011-12-21  to  2012-02-20
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2012-02-21  to  2012-04-17
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2012-04-18  to  2012-06-15
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2012-06-18  to  2012-08-13
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2012-08-14  to  2012-10-10
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2012-10-11  to  2012-12-06
Correlation matrix:
Eigenvalue Distribution of the correlation mat:

Plot the correlation matrices for the returns data with the power map applied with $q=2.5$.

In [9]:
plots_power(epochs_returns_40,2.5)
Dates: From  2010-01-05  to  2010-03-03
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2010-03-04  to  2010-04-30
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2010-05-06  to  2010-06-30
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2010-07-01  to  2010-08-27
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2010-08-30  to  2010-10-28
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2010-10-29  to  2010-12-28
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2010-12-29  to  2011-02-28
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2011-03-01  to  2011-04-26
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2011-04-27  to  2011-06-27
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2011-06-28  to  2011-08-23
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2011-08-24  to  2011-10-21
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2011-10-24  to  2011-12-20
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2011-12-21  to  2012-02-20
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2012-02-21  to  2012-04-17
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2012-04-18  to  2012-06-15
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2012-06-18  to  2012-08-13
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2012-08-14  to  2012-10-10
Correlation matrix:
Eigenvalue Distribution of the correlation mat:
Dates: From  2012-10-11  to  2012-12-06
Correlation matrix:
Eigenvalue Distribution of the correlation mat:

Comparing with the results with ex. 9, we can see that the correlations are generally lower in magnitude for $q=1.5$ and even lower for $q=2.5$. I guess this is to be expected, since the power map is non linear, so it breaks linear correlations.

Furthermore, the eigenvalue distribution is less degenerate for $q=1.5$ and even less for $q=2.5$ and the outlier eigenvalue is smaller.

Ensemble Technique (500 member data ensemble for each epoch, dimension: 38 times 40)¶

Each epoch is 40 days long, so it has 736 time series (stocks) each of length 40. In the ensemble technique, we randomly choose 38 out of these 736 time series to get matrices of dimension 38x40. We repeat that 500 times to get an ensemble of 500 such matrices. Then, we calculate the eigenvalue distribution of the corresponding correlation matrices and with the 500 matrices, we create a histogram of eigenvalues.

In [ ]:
 
In [28]:
len(df.iloc[0,:])
Out[28]:
736
In [31]:
#function to take 30 rows out of a matrix
def choose_rows(A,m=38):
    # First, choose m numbers from 0 to len(A) to select which rows we take
    nums = np.random.choice(range(len(A.iloc[0,:])),m)
    
    # Now take those rows from A and call it A_sel
    
    A_sel = A.iloc[nums,:]
    
    return(A_sel)

# Function to create an ensemble of 500 matrices from matrix A 
# by choosing submatrices given by 40 random rows
def ensemble(A, members=500,m=38):
    ensemble = []
    for i in range(members):
        ensemble.append(choose_rows(A,m))
        
    return(ensemble)


# Define a function that takes an ensemble, computes the eigenvalues of the correlation matrix
# of each element of the ensemble
# and creates an array with all of them
def eigenvals(ens):
    eigs = []
    
    #Iterate over the ensemble
    for mat in ens:        
        #Define correlation matrix
        C = np.array(mat.T.corr())
        
        #Get eigenvalues
        eigs_mat = np.linalg.eigvals(C)
        
        #Append eigenvalues to the complete list
        for j in eigs_mat:
            eigs.append(np.real(j))
            #We make it real because otherwise it will have a extremely tiny imaginary part

            
    return(eigs)
In [32]:
# Function that given the epochs, applies the ensemble
# technique to each epoch and graphs the eigenvalue distribution

def plots_ensemble(epochs,members = 500,m=38):
    for i in range(len(epochs)):
        #Select the ith epoch
        ep = epochs[i]
        print("Dates: From ", ep.columns[0], " to ", ep.columns[-1])
        
#         ep = np.array(ep)
        
        
        #Create the ensemble of 500 matrices with 38 rows.
        ens = ensemble(ep,members=500,m=38)
        
        #Create the eigenvalue distribution of this ensemble
        eig  = eigenvals(ens)
        

        print("Eigenvalue Distribution of the correlation matrices of the ensemble:")

        #Make them real (they are already real, but 
        #numeric errors add a small imaginary part and
        #when plotting it like that, it returns a warning)
        eig = np.real(eig)
        plt.hist(eig, density = True, bins = 100, log=True)
        plt.show()
        
plots_ensemble(epochs_returns_40)
Dates: From  2010-01-05  to  2010-03-03
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2010-03-04  to  2010-04-30
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2010-05-06  to  2010-06-30
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2010-07-01  to  2010-08-27
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2010-08-30  to  2010-10-28
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2010-10-29  to  2010-12-28
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2010-12-29  to  2011-02-28
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2011-03-01  to  2011-04-26
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2011-04-27  to  2011-06-27
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2011-06-28  to  2011-08-23
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2011-08-24  to  2011-10-21
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2011-10-24  to  2011-12-20
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2011-12-21  to  2012-02-20
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2012-02-21  to  2012-04-17
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2012-04-18  to  2012-06-15
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2012-06-18  to  2012-08-13
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2012-08-14  to  2012-10-10
Eigenvalue Distribution of the correlation matrices of the ensemble:
Dates: From  2012-10-11  to  2012-12-06
Eigenvalue Distribution of the correlation matrices of the ensemble:

We see that the eigenvalue distribution is no longer so degenerate as the one in ex9. Furthermore, the distribution is separated in two parts, the first one decays quickly and the second corresponds to outliers and forms a figure similar to a semicircle

In [ ]: